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Misspecifying the Shape of a Random 
Effects Distribution: Why Getting It 
Wrong May Not Matter 



Charles E. McCulloch and John M. Neuhaus 



Abstract. Statistical models that include random effects are commonly 
used to analyze longitudinal and correlated data, often with strong and 
parametric assumptions about the random effects distribution. There 
is marked disagreement in the literature as to whether such parametric 
assumptions are important or innocuous. In the context of generalized 
linear mixed models used to analyze clustered or longitudinal data, we 
examine the impact of random effects distribution misspecification on 
a variety of inferences, including prediction, inference about covariate 
effects, prediction of random effects and estimation of random effects 
variances. We describe examples, theoretical calculations and simula- 
tions to elucidate situations in which the specification is and is not 
important. A key conclusion is the large degree of robustness of maxi- 
mum likelihood for a wide variety of commonly encountered situations. 

Key words and phrases: Maximum likelihood, mixed models, para- 
metric modeling. 



1. INTRODUCTION 

Statistical models that include random effects are 
commonly used to analyze longitudinal and clus- 
tered data. When generalized linear mixed models 
(McCulloch, Searle and Neuhaus, 2008) and maxi- 
mum likelihood estimation are used to analyze such 
data, strong, parametric assumptions about the ran- 
dom effects distribution are typically made. Are in- 
ferences sensitive to this specification? 

One body of research has indicated that the im- 
pact of misspecification of the shape of the ran- 
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dom effects distribution is slight, especially for esti- 
mating regression parameters other than the inter- 
cept (e.g., Neuhaus, Hauck and Kalbfleisch, 1992; 
Neuhaus, Kalbfleisch and Hauck, 1994; Butler and 
Louis, 1992; Heagerty and Kurland, 2001). 

However, a number of authors have claimed sensi- 
tivity to parametric specification of a random effects 
distribution. An oft-quoted article is that of Heck- 
man and Singer (1984), which (in describing per- 
formance of maximum likelihood estimation) states, 
"Both theoretical and empirical examples indicate 
that estimates of structural parameters obtained 
from conventional procedures are very sensitive to 
the choice of mixing distributions." Other authors 
have suggested more flexible distributional assump- 
tions for the random effects to protect against mis- 
specification. Approaches include nonparametric ma- 
ximum likelihood (Aitkin, 1999; Agresti, Caffo and 
Ohman-Strickland, 2004), more flexible parametric 
distributions (Zhang et al., 2008), marginalized mi- 
xed effects models (Heagerty and Zeger, 2000), h-li- 
kelihood approaches that can be easily adapted to fit 
different distributions (Lee and Nelder, 2004), fami- 
lies of parametric distributions (Piepho and McCul- 
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loch, 2004), mixtures of normal distributions (Lesaf- 
fre and Molenberghs, 2001) and "smooth" nonpara- 
metric fits (Zhang and Davidian, 2001). A recent ar- 
ticle (Huang, 2009) encapsulates the sentiment used 
to justify these approaches: 

For computational convenience, random 
effects in GLMMs are almost routinely as- 
sumed to be normal. However, the nor- 
mality assumption may be unrealistic in 
some applications. . . . Early investigation 
to address this concern suggested that mis- 
specifying the models for the random ef- 
fects usually only results in a small amount 
of bias in the maximum likelihood estima- 
tors (MLEs) for the fixed effects (Neuhaus, 
Hauck and Kalbfleisch, 1992). However, 
more recently, many authors have found 
that likelihood-based inference can be se- 
verely affected if the random-effect model 
is misspecified. For example, Heagerty and 
Kurland (2001) computed the asymptotic 
bias in the MLEs for the parameters in a lo- 
gistic mixed model in four instances of ran- 
dom-effect model misspecification. They 
concluded that incorrect assumptions on 
the random effects can lead to substantial 
bias in the MLEs for the fixed effects. Ag- 
resti, Caffo and Ohman-Strickland (2004) 
conducted empirical studies on the impact 
of model misspecification for the random 
effects in GLMMs, showing that the MLEs 
for the fixed effects can be very sensitive 
to the assumed random effect model. Fi- 
nally, Litiere, Alonso and Molenberghs Li- 
tiere, Alonso and Molenberghs (2007) used 
simulation to show that the type I and 
type II errors of tests for the mean struc- 
ture in a logistic mixed model can be seri- 
ously affected by violations of the random- 
effect model. 

A body of work with a slightly different focus 
has been that of estimating the shape of the ran- 
dom effects distribution, also by hypothesizing more 
flexible distributional fits for the random effects. 
Though this is different from assessing misspecifica- 
tion, it is a closely related problem and those meth- 
ods may help diagnose misspecification in the situ- 
ations in which sensitivity to misspecification is of 
concern. Approaches to this problem have used mix- 
tures of normal distributions (Magder and Zeger, 



1996; Caffo, An and Rohde, 2007), and nonparamet- 
ric and "smooth" nonparametric fits (Laird, 1978; 
Davidian and Gallant, 1993; Zhang and Davidian, 
2001; Ghidey, Lesaffre and Eilers, 2004). See Ghidey, 
Lesaffre and Verbeke (2010) for a recent review ar- 
ticle in the context of linear mixed models. 

We use data from the Heart and Estrogen Re- 
placement Study (HERS) (Hulley et al., 1998) to 
illustrate some of our findings. HERS was a ran- 
domized, blinded, placebo controlled trial for women 
with previous coronary disease. The study enrolled 
2,763 women and followed them annually for five 
subsequent visits, generating longitudinal data. More 
detail is given in Section 11, but we foreshadow 
the results here. We fit logistic regression models 
with random intercepts to model whether or not 
a woman had high blood pressure as a function of 
her body mass index, whether she was on hyper- 
tensive medication, and a trend over the visits. For 
the random intercepts the models assumed one of 
four distributions: normal, a centered and scaled ex- 
ponential distribution, a Tukey(g, h) distribution or 
a 3 point discrete distribution, quite different para- 
metric assumptions. The regression parameters were 
very similar, despite evidence for differences in over- 
all model fit. How can we reconcile this with the 
claims above? 

We will argue that concerns over the paramet- 
ric specification of random effects distributions are 
sometimes valid, but ofttimes misplaced. This is due 
to three reasons: (1) sensitivities are restricted to 
aspects of the estimation that are not typically of 
interest, (2) the situations considered are unfairly 
extreme, or (3) that close scrutiny of published re- 
sults does not actually support sensitivity to mis- 
specification. 

Our paper is organized as follows. In the next sec- 
tion we outline a series of examples and their as- 
sociated inferences. We then describe the basic sta- 
tistical model. The sections that follow look at the 
impact of misspecification on various aspects of the 
model. The remaining sections consider some simu- 
lation studies, the HERS example (in more detail), 
some remarks on more complicated random effects 
structures, and we conclude with a brief summary. 

2. INFERENTIAL SETTINGS 

A key argument we make is that the robustness 
is dependent on the inferential setting. Perhaps the 
most common inference for which a regression model 
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is used is estimation and testing of regression coef- 
ficients associated with a covariate. In the clustered 
data setting, a useful distinction is whether a covari- 
ate is a "between-cluster" covariate, meaning that it 
is constant over the units in a cluster or a "within- 
cluster" covariate, meaning that it varies within 
a cluster, but has an average that is constant be- 
tween clusters. For example, in a longitudinal study 
with participants being clusters and all participants 
contributing data across 4 time points, the ethnic- 
ity of the subject would be a between-cluster co- 
variate and the visit number would be a within- 
cluster covariate. Of course, covariates are often nei- 
ther purely between or within, for example, the body 
mass index of a participant measured over time, in 
which case it is often useful for pedagological, sub- 
stantive or technical reasons to decompose a co- 
variate into its purely-between and purely-within 
components (Neuhaus and Kalbfleisch, 1998; Rau- 
denbush and Bryk, 2002; Neuhaus and McCulloch, 
2006). Other inferential goals include using the fitted 
model to generate predictions, generating predicted 
values of the random components of the model, or 
estimating aspects of the distribution of the ran- 
dom components of the model, for example, as in 
a variance components analysis (Searle, Casella and 
McCulloch, 1992). 

2.1 Examples 

To illustrate these distinctions, we begin with a se- 
ries of examples. 

2.1.1 Estimate the effect of a within- cluster co- 
variate Metlay et al. (2007) evaluated the effective- 
ness of an educational program that attempted to 
reduce the inappropriate use of antibiotics for con- 
ditions nonresponsive to antibiotics. The study was 
a cluster randomized trial, randomized at the level 
of the hospital, with 8 intervention and 8 control 
hospitals. Measurements were taken at each hospi- 
tal in the year before and after the interventions 
were introduced. The data were further clustered 
by physician within hospital. The primary inference 
involved a comparison of responses before and after 
the intervention, a within-cluster (= hospital) co- 
variate setting. 

2.1.2 Estimate the effect of a between-cluster co- 
variate In the previous example, 8 of the hospitals 
were Veteran's Administration (VA) hospitals and 8 
were non-VA. A secondary goal was to compare the 
rates of inappropriate usage between VA and non- 
VA hospitals, a between-cluster covariate setting. 



2.1.3 Derive predictions based on the fixed effects 
Auble et al. (2007) studied the performance of four 
clinical prediction rules for estimating the risk of 
death or serious complications for patients admit- 
ted to the hospital with a diagnosis of heart failure. 
The goal was to develop a rule that could strat- 
ify patients into low and high risk groups. One of 
the models used was a random effects logistic re- 
gression model accommodating clustering of out- 
comes by hospital. The inferential goal was to de- 
velop a prediction rule based on the fixed effects in 
the model. 

2.1.4 Predict random effects Zhang et al. (2008) 
wanted to estimate subject-specific rates of disease 
progression in chronic kidney disease patients. They 
estimated subject-specific slopes of the change in 
glomerular filtration rate (a measure of kidney func- 
tion) over time. They used a linear mixed model 
with clustering by subject and non- normally dis- 
tributed random effects. The inferential goal was to 
predict the realized values of the random effects. 

2.1.5 Estimate variance components Selby et al. 
(1996) studied the variation in the rates of coronary 
angiography (a diagnostic procedure using X-rays 
and a special dye to diagnose coronary artery dis- 
ease) across 16 hospitals. Significant variation after 
adjusting for differences in patient populations in- 
dicates over- or under-use of the procedure, leading 
to poor patient outcomes and/or wasted resources. 
The inferential goal was to estimate the variance of 
the hospital random effects after adjusting for pa- 
tient characteristics. 

2.2 Longitudinal and Clustered Studies 

Each of the above examples describes clustered 
or longitudinal studies, in which repeated observa- 
tions of the outcome are taken within clusters, for 
example, multiple measurements of kidney function 
on a patient. These are the types of scenarios we 
consider. 

An immediate distinction to make is the different 
scenario considered by Heckman and Singer (1984), 
in which there is only a single observation of the 
outcome per cluster (in their case, a time-to-event 
outcome). In the normal linear mixed model set- 
ting, with only a single observation per cluster, the 
random effect and error term are completely con- 
founded, leading to lack of identifiability. The pres- 
ence of identifiability in the Heckman and Singer 
situation arises only through strong parametric as- 
sumptions, so it is not surprising that results are 
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sensitive to the choice of distribution. Such settings 
do not represent longitudinal or clustered studies 
and we do not consider them further. 

3. A GENERALIZED LINEAR MIXED MODEL 

The model we will use to assess the impact of 
misspecification is a generalized linear mixed model 
for clustered data with random intercepts, b\. There 
are fewer results for more complicated covariance 
structures, such as those generated by random in- 
tercept and slope models, which we discuss briefly 
in Section 12. Let Ya represent the ith observation 
(i = 1, . . . , 7ij) within cluster i (i = 1, . . . , m). We as- 
sume that, conditional on the random effects, the Ya 
are independent: 

Yit\bi ~ independent Fy, 

i = l,...,m;t = l,...,m, 

(1) g(E[Y lt \b l ])=b l + x' tt f3, 

bi ~ i.i.d. F b , 

E[6j] = and var(6j) = 

where g(-) is a known link function, (3 is the param- 
eter vector for the fixed effects, and Xj t is a vector 
of covariates for cluster i at time t. We consider the 
performance of maximum likelihood for estimating 
the parameters. We focus especially on the situation 
where the assumed distribution is normal, since this 
is the most commonly implemented choice in popu- 
lar software packages such as SAS, Stata and R. 

The model, (1), contains a number of specifica- 
tions. The main specification we will consider is the 
shape of the random effects distribution, that is, the 
parametric form of the distribution, Fb, may be in- 
correctly specified. 

We have briefer comments on two other aspects 
of the specification. We may be concerned that: 

• Basic characteristics of the random effects distri- 
bution may depend on a covariate. For example, 
the mean or variance of Ff, depends on a covariate. 

• There is a dependence of the random effects dis- 
tribution on cluster sample size. For example, the 
mean of Fj, depends on rij. 

When the mean of the random effects distribution 
depends on a covariate, a fundamental relationship 
is introduced between the covariate and the distri- 
bution, potentially creating a serious bias in esti- 
mating the form of the relationship between the co- 
variate and the outcome. Neuhaus and McCulloch 



(2006) discuss reasons for this, suggest decompos- 
ing covariates of interest into within- and between- 
cluster components and investigate conditions un- 
der which this decomposition can remove or reduce 
bias. Heagerty and Kurland (2001) study situations 
where the variance of the random effects distribu- 
tion depends on a covariate and consider use of con- 
ditionally specified models (as we do here), as well 
as models that specify a marginal regression equa- 
tion (e.g., logistic regression for binary outcomes) 
but incorporate random effects in an underlying con- 
ditional model. They show that, for estimating the 
conditionally specified parameters of (1), the impact 
of highly unequal variances can lead to substantial 
bias. So both of these aspects of the specification 
of (1) are potentially important. 

Some authors have argued (Hoffman, Sen and 
Weinberg, 2001; Williamson, Datta and Satten, 2003) 
that when the cluster sample size is incorrectly as- 
sumed to be independent of the random effects dis- 
tribution, serious consequences may result. Neuhaus 
and McCulloch (2011) argue that this type of mis- 
specification is really a variation of incorrect as- 
sumptions about the mixing distribution. If one as- 
sumes that the parameters in the joint distribution 
of Ui and Xij are not functionally related to the pa- 
rameters of interest, namely, those in the distribu- 
tion of outcomes or random effects, then as a func- 
tion of P, Neuhaus and McCulloch (2011) show that 



/(y i5 ni,Xj) 




= /(yiK,Xj). 



This is a useful representation of the joint likeli- 
hood of yi , rii and Xj since analysts would typically 
model the distribution of the responses conditional 
on covariates and sample size. In particular, ignor- 
ing the association of cluster size with response, one 
would base inference on the incorrect likelihood built 
up of terms 




where the asterisks denote assumed distributions. 

Comparing equation (3), the assumed likelihood 
and (2), the true likelihood, we see that we can 
view (3) as arising from (2) but with a misspecified 
random effects distribution, namely, the conditional 
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distribution of b given rii and Xj is incorrectly spec- 
ified as fu(b). This is important because some of 
the comments we make below for random intercepts 
models concerning misspecification of the shape of 
the random effects distribution will also apply to 
incorrectly assuming that the cluster sample size is 
independent of the random intercept distribution. 
Situations in which the informative cluster size is 
related to the random slopes are more involved. 

We next consider in more detail the impact of 
shape misspecification of the random effects distri- 
bution, organized by the inferential targets. 

4. ESTIMATE/TEST A WITHIN-CLUSTER 
COVARIATE 

Virtually every study of the impacts of misspeci- 
fication has shown little impact on within-cluster co- 
variates. Neuhaus, Hauck and Kalbfleisch (1992), 
Neuhaus, Kalbfleisch and Hauck (1994) and Heager- 
ty and Kurland (2001) developed analytic results to 
assess the impact of misspecification of the random 
effects distribution using the theory of inference un- 
der misspecified models (Huber, 1967; Akaike, 1973; 
White, 1994). This theory shows that estimators ob- 
tained by maximizing the likelihood based on a mis- 
specified random effects distribution converge to val- 
ues that minimize the Kullback-Leibler divergence 
(Kullback, 1959) between the correctly specified and 
misspecified models. In cases such as the linear mixed 
effects model and binary matched pairs (Neuhaus, 
Kalbfleisch and Hauck, 1994), one can obtain closed 
form solutions for the values that minimize the Kull- 
back-Leibler divergence. In these cases, the closed 
form solution shows that the estimator based on mis- 
specified random intercepts is identical to the stan- 
dard conditional likelihood estimator, an estimator 
unaffected by the random effects distribution. In 
other cases, one can show consistent estimation at 
f3 = for generalized linear mixed models with mis- 
specified random intercept distributions. For exam- 
ple, Neuhaus, Hauck and Kalbfleisch (1992) show 
consistent estimation at /3 = for logistic models 
with misspecified random intercept distributions and 
Neuhaus and McCulloch (2011) extend the result to 
the entire class of generalized linear mixed models. 
In addition, one can obtain approximate solutions to 
Kullback-Leibler divergence minimizers using Tay- 
lor expansions (Neuhaus, Hauck and Kalbfleisch, 
1992). Using this device, Neuhaus, Hauck and Kalb- 
fleisch (1992) show little asymptotic bias in a logis- 
tic model with misspecified random intercepts. Hea- 
gerty and Kurland (2001), in their Table 1, show vir- 



tually no impact on the asymptotic bias in logistic 
regression models with a wide variety of gamma- 
distributed random effects. Chen, Zhang and Da- 
vidian (2002), in investigating bias and efficiency in 
logistic models, state that: 

Estimation of /?2, which corresponds to 
a covariate changing within individuals, 
suffers no loss of efficiency under misspec- 
ification of the random effects distribu- 
tion. Similar results have been reported 
by Tao et al. (1999) and Zhang and Da- 
vidian (2001). ... a within individual co- 
variate such as time is roughly 'orthogo- 
nal' to among-individual effects. Thus, es- 
timation of the associated regression coef- 
ficient may be less affected. 

The above cited Zhang and Davidian article, inves- 
tigating a linear mixed model, shows little impact 
on bias or efficiency. 

So for the inferential goal which is arguably the 
most relevant to clustered or longitudinal data, mis- 
specification of the shape of the random effects dis- 
tribution has little or no effect. 

5. ESTIMATE/TEST A BETWEEN-CLUSTER 
COVARIATE 

Between-cluster covariates might be expected to 
be more sensitive to shape specification. We agree 
with Chen, Zhang and Davidian (2002), who state, 
"We conjecture that, because a cluster-level covari- 
ate such as treatment and the latent random ef- 
fects both pertain to among-individual variation, 
misspecification of the random effects distribution 
would compromise quality of estimation of the cor- 
responding regression coefficient." However, the re- 
sults of Neuhaus, Hauck and Kalbfleisch (1992) and 
Neuhaus and McCulloch (2011) on consistent esti- 
mation at /3 = and the Taylor approximations of 
Neuhaus, Hauck and Kalbfleisch (1992) apply equally 
to within- and between-cluster covariates. In addi- 
tion, the simulation results of Heagerty and Kur- 
land (2001) show virtually no bias in estimates of 
between-cluster covariate effects. 

Via simulation, Agresti, Caffo and Ohman-Strick- 
land (2004), in their Table 3 and for a logistic model, 
showed moderate loss in efficiency when compar- 
ing assumed normal and two-point discrete distri- 
butions versus the same two true distributions with 
a small number of clusters (10). For a linear mixed 
model, Magder and Zeger (1996), in their Table 3, 
showed little or no loss in efficiency when assuming 
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a normal random effects distribution when the dis- 
tribution was actually skewed or bimodal, but mod- 
erate efficiency loss when the true distribution was 
a two-point discrete distribution. 

Of more concern are the results of Zhang and Davi- 
dian (2001), Litiere, Alonso and Molenberghs (2007) 
and Litiere, Alonso and Molenberghs (2008), who 
investigated performance under highly non-normal 
distributions, but not as extreme as a two-point 
discrete distribution. Zhang and Davidian (2001) 
present simulation results for a linear mixed model 
with a true mixture of normals distribution but fit 
with an assumed normal distribution and show loss 
of efficiency. Litiere, Alonso and Molenberghs (2007) 
present simulation results for a logistic mixed model 
under a true "Power function" distribution (a con- 
tinuous, but bounded range distribution) and as- 
sumed normal distribution and claimed a significant 
loss of power under an assumed normal distribution. 

The scenario simulated by Zhang and Davidian was 
a balanced, linear mixed model, comparing clustered 
responses between two levels of a binary between- 
cluster covariate. The usual maximum likelihood ba- 
sed analysis estimates the between group difference 
with the ordinary least squares estimate (Searle, Ca- 
sella and McCulloch, 1992), the difference in the 
group means. So the comparison simulated was es- 
sentially the performance of the t-test under non- 
normality. While the two-group t-test is well known 
to be highly robust (e.g., Rasch and Guiard, 2004), 
it is not impervious to non-normality and demon- 
strates loss of efficiency in this extreme violation of 
the normality assumption. However, the estimates 
are exactly unbiased. 

The Litiere, Alonso and Molenberghs (2007) and Li- 
tiere, Alonso and Molenberghs (2008) simulations in- 
vestigate the performance of tests that assume a nor- 
mal distribution for the random effects while varying 
the true distributions. This is useful for understand- 
ing the robustness of methods available in standard 
software (which often assume normally distributed 
random effects) when the true distribution is differ- 
ent than the assumed. 

However, this does not directly address the ques- 
tion of misspecification, which requires fixing the 
true distribution and varying the assumed distribu- 
tion. To see why this is the case, consider a hypo- 
thetical situation where the standard deviation of an 
estimated parameter is 1 when the true and assumed 
random effects distributions are normal, but is equal 
to 2 when the true and assumed random effects dis- 



tributions are t with 3 degrees of freedom. Further, 
suppose that, when the true distribution is t$, but 
we assume the distribution is normal, the standard 
deviation is 2.1. In this case, we would conclude 
that misspecification (assuming the distribution is 
normal when, in fact, it is £3) has little impact, de- 
creasing efficiency by 5%. But, if we compare the 
standard deviations under the two true distributions 
(1 under normality and 2.1 under £3), we might be 
tempted to incorrectly conclude that misspecifica- 
tion causes a large decrease in efficiency. In Neuhaus, 
McCulloch and Boylan (2011), we therefore consid- 
ered the same scenario as investigated in Litiere, 
Alonso and Molenberghs (2007), but we varied the 
assumed distribution. Contrary to their conclusions, 
our results showed that misspecification had virtu- 
ally no impact on power, except a very modest im- 
pact for the combination of small sample sizes with 
huge random effects variances (variances of 16 or 32 
on the logit scale). Litiere, Alonso and Molenberghs 
(2008) consider the situation of random intercepts 
and slopes, which we consider in Section 12. 

Similar results are true of investigations of infor- 
mative cluster sizes with random intercepts. Hoff- 
man, Sen and Weinberg (2001), Williamson, Datta 
and Satten (2003) and Benhin, Rao and Scott (2005) 
investigate the impact of informative cluster sizes on 
between-cluster covariates in logistic models via sim- 
ulation. Hoffman, Sen and Weinberg (2001) states 
that, "We will show by a simulated scenario that 
when cluster size is nonignorable the behaviour of 
the generalised estimating equations approach breaks 
down." and Williamson, Datta and Satten (2003) 
claim that, ". . . the usual generalized estimating 
equation approach resulted in severely biased esti- 
mates of both the marginal regression and associa- 
tion parameters." These suggest a general failure of 
generalized estimating equation approaches. How- 
ever, careful examination of their results shows that 
the poor performance is mainly isolated to the in- 
tercept terms. The simulation reported in Table 3 of 
Hoffman, Sen and Weinberg (2001) does not demon- 
strate bias for b±, the covariate effect, and simu- 
lations reported in Williamson, Datta and Satten 
(2003) show bias on the order of 5% (their Table 1) 
for covariate effects. Similar results hold for maxi- 
mum likelihood and random intercept models (Neu- 
haus and McCulloch, 2011). 

In summary, bias in estimates of between-cluster 
covariates has not been demonstrated. Efficiency loss 
has been demonstrated, but only in distributions 
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quite far from non-normality, indicating a high de- 
gree of robustness. The degree of robustness is sim- 
ilar to that of normality-based tests of means. 

6. ESTIMATION OF AN INTERCEPT 

In many cases, linear mixed models assuming nor- 
mality give unbiased or nearly unbiased estimates of 
all the fixed effect parameters including the inter- 
cept, even when the random intercepts distribution 
is non-normal. However, the same is not true for 
nonlinear models. Both theoretical and simulation 
studies have shown that estimates of the intercept 
may be biased when the random effects distribution 
is far from normal. Neuhaus, Hauck and Kalbfleisch 
(1992) give a local approximation for logistic mod- 
els relating bias of estimators of the intercept un- 
der assumed normality for the random intercepts 
to asymmetry of the random effects distribution. 
Heagerty and Kurland (2001) show asymptotic rel- 
ative bias on the order of 20% or more when the 
true distribution is gamma but the assumed nor- 
mal. 

Similar results hold for informative cluster sizes. 
Williamson, Datta and Satten (2003) demonstrate 
bias of approximately 30% in estimating the inter- 
cept using independence generalized estimating 
equations using simulations for a logistic model. Sim- 
ilarly, Neuhaus and McCulloch (2011) demonstrate 
modest bias in estimating the intercept using max- 
imum likelihood methods. They give a theoretical 
argument as to why bias is to be expected in the 
intercept but not other regression parameters. 

In summary, misspecification of the shape of the 
random effects distribution can introduce a moder- 
ate to large bias in estimation of the intercept. This 
will carry over to possible bias in estimation of the 
mean value of the outcome for fixed covariate val- 
ues. So, when inference focuses on mean estimation 
or the intercept, care should be taken with the spec- 
ification of the random effects distribution. 

7. PREDICTION OF THE RANDOM 
EFFECT, bi 

Much less work exists on the effect of misspecifi- 
cation on prediction of the random effects. Magder 
and Zeger (1996) demonstrated robustness of predic- 
tions as gauged by mean square error of prediction 
in linear mixed models and conclude, "Differences in 
the performances of the various methods in estima- 
tion of the random effects, b, were generally not very 
large." In simulating the performance of the poste- 



rior mean estimates in logistic models, Agresti, Caffo 
and Ohman-Strickland (2004) did not find a change 
in performance when the true distribution was var- 
ied between uniform, exponential and normal and 
the assumed distribution was normal. However, as 
noted above, this does not directly address the ques- 
tion of misspecification. They did find a number of 
situations in which assuming a normal distribution 
suffered a moderate loss of performance when the 
true distribution was a discrete two-point distribu- 
tion. Zhang et al. (2008) (their Table 2) showed 
only modest differences in the mean square error 
of prediction of random slopes and intercepts when 
assuming normality and under a true log gamma 
distribution for a linear mixed model. McCulloch 
and Neuhaus (2011) investigated both linear mixed 
models and logistic mixed models under a variety 
of assumed and true distributions, both by theory 
and simulation, and found only modest impacts to 
misspecification on the mean square error of predic- 
tion, using posterior mean predictions for the linear 
mixed models and posterior mode predictions for the 
logistic mixed models. Exceptions were when a dis- 
tribution with limited support was assumed, but the 
true distribution had a wider range of support and 
for large random effects variances and large cluster 
sizes. 

The shape of the distribution of the best predicted 
values is a different matter. A number of authors 
(Lesaffre and Molenberghs, 2001; Magder and Zeger, 
1996; Zhang and Davidian, 2001) have demonstrated 
convincingly it may not reflect the true underlying 
shape of the distribution of the bi, but instead the 
assumed distribution. We also demonstrate this in 
our example. So, although the performance of the 
best predicted values (as gauged by the mean square 
error of prediction) is robust, shapes of distributions 
of best predicted values are not. 

To recap, the shape of the distribution of the best 
predicted values is highly sensitive to the assumed 
form of the distribution. This is unfortunate be- 
cause it means that diagnostics based on the empir- 
ical distribution of the best predicted values (e.g., 
histograms or Q-Q plots) are unreliable. However, 
the performance of either posterior mean or pos- 
terior mode predicted values (as judged by overall 
mean square error of prediction) are generally ro- 
bust across a wide variety of assumed distributions. 
Some loss of efficiency was evidenced with large clus- 
ter sizes and large random effects variances, in which 
definition of the correct random effects distribution 
is clearer. 
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8. ESTIMATE THE RANDOM EFFECTS 
VARIANCE 

Again, with regard to estimation of the random ef- 
fects variance, there is less research. Neuhaus, Hauck 
and Kalbfieisch (1992) and Heagerty and Kurland 
(2001) demonstrated little effect on estimation of 
the random effects variance, with asymptotic rela- 
tive bias on the order of 15% or less. Agresti, Caffo 
and Ohman-Strickland (2004) demonstrated some 
situations, mostly with larger random effects vari- 
ances and large cluster sizes, when there was ef- 
ficiency loss with assuming a normal distribution, 
when the true distribution was a discrete, two-point 
distribution. Neuhaus and McCulloch (2011) show 
via simulation in the informative cluster size case 
that bias is slight. 

As noted, this aspect of misspecification has not 
received the scrutiny that the other aspects above 
have, perhaps because inference centering on the 
random effects variance is somewhat less common. 
As a result, the literature is far from definitive. Nev- 
ertheless, the basic picture is similar to that of co- 
variate effects: estimates of the random effects vari- 
ance appears relatively robust to misspecification of 
the random intercept distributional shape. 

9. MISSPECIFICATION OF OTHER ASPECTS 
OF THE MODEL 

We have mainly considered the situation in which 
only a single aspect of the model, the shape of the 
random intercepts distribution, has been misspeci- 
fied. In practice, of course, all assumptions are vi- 
olated to at least a minor degree. What is the im- 
pact of simultaneous misspecification of more than 
one aspect? There is little in the literature to re- 
port, but previous work does inform on two aspects. 
First, Neuhaus and McCulloch (2006) show that use 
of conditional likelihood methods and methods that 
separate covariates into between and within compo- 
nents eliminate or reduce bias for the within-cluster 
covariate in the situation where the random effects 
are associated with the covariates. The argument 
works by showing that estimation of the within- 
cluster covariate is essentially divorced from the ran- 
dom effects distribution when using these methods, 
also making it impervious to shape misspecification. 
Second, the arguments of Neuhaus and McCulloch 
(2011), showing that informative cluster sizes have 
little impact on the bias of regression coefficients, 
proceed by converting the informative cluster size 



problem into the misspecified random effects dis- 
tribution as displayed in (2). This same argument 
can be used to show that, even with simultaneous 
misspecification of the shape of the random effects 
distribution and informative cluster sizes, maximum 
likelihood estimators of the regression coefficients 
(other than the intercept) will be consistent in a li- 
near mixed model and, for ft = 0, consistent at zero 
for generalized linear mixed models. While not all- 
encompassing, this result for generalized linear mixed 
models is important for testing the null hypothesis 
that ft = 0. 

10. A SIMULATION STUDY 

We performed a simulation study to evaluate the 
performance of each of the above aspects of infer- 
ence. The goal was to gauge the performance of as- 
sumed normal fits to a distribution that was highly 
non-normal, but not as extreme as a two-point, dis- 
crete distribution. We chose a Tukey(<7, h) distribu- 
tion, since, depending on the values of g and h, 
the Tukey distribution can be quite skewed and/or 
heavy-tailed. See He and Raghunathan (2006) for 
a recent reference. We chose g = 0.5 and h = 0.1, 
which gives a mean of 0.31, variance of 2.27, skew- 
ness of 3.41 and a kurtosis of 44.24. 

The simulations used two covariates: one within- 
cluster and one between-cluster covariate. The within- 
cluster covariate was equally spaced between and 1. 
The between-cluster covariate was binary with a 25%/ 
75% division. The parameter values were: fto = —2.5, 
/^between = 2, /3 wit hin = 1, cr 6 = 1. We set the number 
of clusters, m, to 200 and used a variety of clus- 
ter sizes (n = 2,4, 6, 10,20 and 40). We simulated 
data under a logistic link and the Tukey(g, h) distri- 
bution. We ran 1,000 replications for each scenario 
and used common random numbers across different 
cluster sizes and fitted distributions to increase pre- 
cision of comparisons. We conducted simulations in 
SAS (Ver 9.1, SAS Institute, Cary NC) and fit mod- 
els using Proc NLMIXED. 

To each simulated data set we fit three GLMMs 
with a logistic link. The first model assumed that the 
random effects were standard normal and the second 
assumed they followed a standardized Tukey((/, h) 
distribution with unknown g and h. Especially with 
small cluster sizes, there is little or no information 
about the shape parameters of the Tukey distribu- 
tion and trying to estimate those parameters led to 
unstable estimation. To directly assess the effect of 
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• — • Tukey (known g, h), O—O Tukey (estimated g, h), Normal 

Fig. 1. Bias of estimators of the parameters of a logistic mixed effects model, (1), with Tukey(g,/i) distributed random 
intercepts, fit with assumed normal and Tukey distributions. 



misspecification and to avoid confounding the effects 
of an incorrect distributional shape with the estima- 
tion of the two additional parameters, we fit a third 
GLMM with the random effects following a Tukey 
distribution with g and h fixed at their true values of 
0.5 and 0.1, but still estimating the random effects 
variance. 

Figure 1 gives the results for the bias of the esti- 
mators. As expected, there was virtually no impact 
of using an assumed normal distribution in the es- 
timation of the within-cluster covariate. There was 
a modest impact in estimating the between covariate 
and the intercept (but with bias less than 5% for the 
between-cluster covariate and less than 10% for the 
intercept). Impact on estimation of the log standard 
deviation of the random effects was negligible. 

As noted, estimation of the g and h parameters 
for the Tukey distribution introduced instability in 
estimating the log standard deviation of the random 
effects. This led to poor estimation of the intercept 
for cluster size n = 2 and for the log standard de- 
viation for cluster sizes n = 2,4,6 and 10. Perhaps 
surprisingly, the estimation of g and h had little im- 
pact on estimation of /^between- 

Fitting these models is quite involved, so we give 
some further details on the simulation. Not all the 
replications in the simulations achieved convergence 
via NLMIXED. Fitting the assumed normal and 
Tukey distribution with known g and h had ex- 
cellent convergence rates, all above 95%. However, 



the Tukey distribution with estimated g and h was 
numerically challenging and convergence rates for 
n = 4 and 6 were lower (82.5% and 83.8%, respec- 
tively), with estimation of logo";, especially problem- 
atic. The figures give results for all the runs, using 
final parameter values in cases with nonconvergence. 
Comparing the convergent runs only with all the 
runs showed little impact on the bias of the param- 
eters in Figure 1, excepting log o^. Simulation code 
is available from the authors. 

Figure 2 gives the results for the standard devi- 
ations of the estimators. The standard deviations 
of the estimators were nearly the same under both 
the fitted normal and Tukey distributions across all 
the cluster sizes and for all the parameters, with 
the exception being a modest loss of efficiency for 
the between-cluster covariate, especially with larger 
cluster sizes. So misspecifying the random effects 
distribution produced essentially no loss in estima- 
tion efficiency. 

Since estimation of between-cluster covariates is 
an important inferential goal in clustered data set- 
tings and there is a modest impact of misspecify- 
ing the distribution as normal, we give more de- 
tail on the actual distributions of the estimates of 
the between-cluster covariate effect. Figure 3 shows 
boxplots of the estimates under the assumed nor- 
mal and Tukey (with fixed g and h) fits. Behavior 
of the assumed normal fit is slightly worse than the 
true Tukey fit for cluster sizes of 10 and 20 and 



10 



C. E. MCCULLOCH AND J. M. NEUHAUS 



Within Covariate 



Between Covariate 




10 20 30 

Cluster size 



10 20 30 

Cluster size 



Intercept 




10 20 30 

Cluster size 



Log SD of b 




10 20 30 

Cluster size 



Tukey (known g, h), G— © Tukey (estimated g, h), 



Fig. 2. Standard deviation of estimators of the parameters of a logistic mixed effects model, (1), with Tukey (g,h) distributed 
random intercepts, fit with assumed normal and Tukey distributions. 
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Fig. 3. Boxplots of estimates of ^between from a logistic Fig. 4. Mean square error of prediction of the random ef- 
mixed effects model, (1), with Tukey(p,/i) distributed random fects from a logistic mixed effects model, (1), with Tukey(g,/i) 



intercepts, fit with assumed normal and Tukey distributions. 
Reference line given at the true value of /3b e tween = 2. 



distributed random intercepts, fit with assumed normal and 
Tukey distributions. 



somewhat more variable for a cluster size of 40. 
For smaller cluster sizes the results are comparable. 
Given the extreme differences between a normal and 
Tukey(0.5, 0.1) distribution, this represents a large 
degree of robustness, especially for cluster sizes 6 or 
smaller. 

SAS Proc NLMIXED calculated best predicted 
values as modes of the log of /(yi|xj, &i)/b(&i|(T 6 ). 
Figure 4 shows the mean squared error of predic- 
tion under the three fitted models. Compared to 
the Tukey distribution with estimated g and h, the 



misspecified normal actually outperforms it up until 
cluster sizes of 10. Compared to the Tukey distribu- 
tion with fixed g and h, the performance is slightly 
worse, with the inflation in mean square error of 
prediction ranging from about 5% at n = 2 to about 
20% for the larger cluster sizes. Again, given the ex- 
treme differences between a normal and Tukey, this 
is a high degree of robustness. 

These results support the broad conclusion that, 
for distributional shapes quite different from the nor- 
mal, estimation of the intercept may be biased. How- 
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Table 1 

HERS model fit comparisons with different assumed random effect distributions. 
Model-based standard errors for the fixed effects are given as subscripts 



Random effects 
distribution 



Parameter estimates 



-2 loglik Intercept Visit 



BMI 



HTN log(<x b ) 



Normal 
Exponential 
Discrete 
Tukey 



3695 -4.28o.4i O.860.05 0.024 .oi3 -0.36 .ie 0.49 

3732 -3.96o.38 0.84 . 05 0.023 .oi3 -0.30 .i4 0.19 

3674 -4.05o.38 0.87o.os 0.021 .oi 2 -0.37 .i 6 0.23 

3677 -4.10i.oe O.860.05 0.022 .oi3 -0.36o.ie 0.28 



ever, the bias for estimating other parameters was 
low and efficiency high. For prediction of random ef- 
fects, the mean square error of prediction was mod- 
estly increased when incorrectly assuming normal- 
ity, especially for larger cluster sizes. 

11. EXAMPLE— HERS 

HERS was a randomized, blinded, placebo control- 
led trial for women with previous coronary disease. 
The study enrolled 2,763 women and followed them 
annually for five subsequent visits. We will consider 
only the subset of N = 1,378 that were not diabetic 
and who had systolic blood pressure less than 140 
at the beginning of the study. We treat HERS as 
a prospective cohort study using the first four vis- 
its. We modeled the binary outcome of high blood 
pressure (HBP), defined as a systolic blood pressure 
greater than 140, as a function of visit (numerical 
through 3), body mass index (BMI) and whether or 
not the participant was on high blood pressure med- 
ication at that visit (HTN). The visit trend variable 
is essentially a within-person covariate and BMI and 
HTN are mostly between-person covariates (with 
95% and 76%, respectively, of their variability as- 
sociated with between-person differences). 

With HBPj£ equal to 1 if woman i at visit t had 
high blood pressure (and otherwise), our random 
intercepts logistic regression model was 

logit(P{HBP l4 = 1}) 

= A) + b 0i + ft visit* + /3 2 BMI it + /3 3 HTN it , 

(4) 

where &o« ~ i-i-d. J\f(0, a%) or 

boi ~ i.i.d. ab{Exp(l) — 1} or 

&0i ~ i.i.d. discrete with three mass points or 

6 i ~ i.i.d. a b {Tukey(g,h)}. 

We fit models via maximum likelihood to the data 
using, in turn, each of the assumed random effects 



distributions given in (4). We used SAS Proc 
NLMIXED (SAS Institute, Cary, NC) for the con- 
tinuous distributions, which gives posterior mode es- 
timates of the boi. We used the Stata module 
GLLAMM (www.gllamm. org) to fit the three point 
discrete discrete distribution, which gives posterior 
mean estimates of the &oi- 

We chose the exponential (Exp) and Tukey distri- 
butions since they are parametric but quite different 
from the normal. The exponential distribution has 
scale parameter 1, a bounded range, and is skewed 
right. The Tukey distribution is a flexible paramet- 
ric family that can take on a wide variety of shapes 
(He and Raghunathan, 2006); it includes the normal 
as a special case (g = h = 0) but also includes dis- 
tributions with extreme skewness and kurtosis. The 
discrete distribution is similar to a nonparametric 
maximum likelihood fit, but specifies a priori the 
number of mass points. 

Table 1 lists the fitted coefficients and maximized 
log likelihood values. As expected, the fixed effects 
parameter estimates are quite similar, especially the 
"within" covariate, Visit, even though there are mod- 
est differences in the fits of the models as judged 
by the value of the maximized log likelihood. The 
shape parameters for the Tukey distribution had 
large standard errors but were estimated to be 
g = 2.5 and h = —1.8, which is a distribution with 
bounded range, and is slightly skewed right and hea- 
vy tailed (skewness of about 0.9 and kurtosis of 
about 2.3). 

Figure 5 gives histograms of the best predicted 
values of the random intercept deviations, &oi> un- 
der each of the above assumed distributions. As is 
evident from the figure, the shape of the distribution 
of the best predicted values is dependent on the as- 
sumed random effects distribution, with the discrete 
and exponential distributions especially having dif- 
ferent shapes than the normal and Tukey, which are 
similar to one another. 
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Fig. 5. Best predicted values for the HERS data under different assumed random effects distributions. 



12. RANDOM INTERCEPTS AND SLOPES 

The remarks above pertain mostly to models with 
random intercepts only. An understudied area is what 
happens with the further complication of random 
coefficient models, such as random intercepts and 
slopes. The line of argument cited above using min- 
imization of Kullback-Leibler divergence (Neuhaus, 
Hauck and Kaibneisch, 1992; Neuhaus, Kalbfleisch 
and Hauck, 1994; Heagerty and Kurland, 2001) can 
also be used in the random intercepts and slopes 
situation. This line of argument shows, for the lin- 
ear mixed model, that estimates of fixed effects pa- 
rameters will be consistent. However, for the linear 
mixed model this is well known (Verbecke and Lesaf- 
fre, 1997) by alternate means. For generalized linear 
mixed models, results analogous to the random in- 
tercepts case hold. For fixed effects for which there is 
a corresponding, misspecified random effects distri- 
bution, bias can result. But fixed effects orthogonal 
to random effects are little affected. For example, if 
a covariate is both a fixed effect and a random ef- 
fect (i.e., a random slope), the estimate of the fixed 
effect for that covariate may exhibit bias when the 
distribution of the random slope is misspecified. But 
estimates of covariate effects, orthogonal to the one 
which is random, are little affected. For the informa- 
tive cluster size situation, Neuhaus and McCulloch 
(2011) make these arguments more precise. In the 
case of prediction of realized values or random ef- 
fects, McCulloch and Neuhaus (2011) report limited 



results for random intercepts and slopes; they show 
a fair degree of robustness for mean square error of 
prediction and comparable behavior to the random 
intercepts case. 

Contrary to the above, Litiere, Alonso and Molen- 
berghs (2008) describe some simulations for the ran- 
dom intercepts and slopes situation that report to 
show significant bias for a mixed effects logistic mo- 
del with true normal or true mixture of normal dis- 
tributions to which an assumed bivariate normal 
distribution is fit. The situation they simulate is 
quite extreme. The true bivariate mixture distribu- 
tion corresponding to their least extreme configura- 
tion, which they call Vi, is a mixture of standard 
normals with mean values at plus or minus 2 for 
both the random intercept and slope; this generates 
a distribution (in two dimensions) with two isolated 
peaks. The standard deviation on the logit scale 
for both the intercepts and slopes for this model 
is v / 5~2.24. 

For the random intercept, a random effect one 
standard deviation above its mean of would have 
an odds of the outcome that is exp(2.24) ps 9.36 
higher than a random effect at its mean of 0. So 
this is a moderately large effect. The effect of a ran- 
dom slope one standard deviation above its mean 
of depends on the values of the associated covari- 
ate, which range from to 8 in their simulation. 
So a random slope one standard deviation above its 
mean of would have the same odds of the outcome 
at t = compared to an "average" random slope, 
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Table 2 

Medians of maximum likelihood estimates of fixed effects from fitting model (5) assuming a bivariate normal distribution 
when the true distribution is normal or a mixture of bivariate normals. /3o is the intercept, /3b is the between- cluster 
coefficient and f3 w is the within- cluster coefficient. In each case the random intercepts had variance 5 and a correlation with 
the random slopes of 0.9. The random slopes variance was either 5 (upper panel) or the more reasonable value of 0.08 (lower 
panel). The simulation generated 500 replications with 100 clusters of size 6 



Median estimates (true values) 





True dist'n 


Simulation 


0o (-6) 


/3b (2) 


(1) 


Convergence rate 


var(6„,i) = 5 


Normal 


Current 


-6.26 


2.13 


1.07 


98% 




Normal 


Litiere et al. 


-6.14 


2.04 


1.04 


not given 




Mixture 


Current 


-6.34 


2.87 


0.86 


99% 




Mixture 


Litiere et al. 


-10.52 


2.52 


-0.13 


<43% 


vax(b wi ) = 0.08 


Normal 


Current 


-6.18 


2.12 


1.04 


100% 




Mixture 


Current 


-5.89 


2.26 


0.95 


100% 



but at t = 8 would have an odds of the outcome of 
exp(8\/5) ~ 58,700,000 times higher. Furthermore, 
the correlation between the random intercepts and 
slopes was 0.9. The other scenarios they simulate 
are even more extreme! A more reasonable value for 
the random slope variance would be approximately 
0.08. At t = 8 this would generate an odds of an 
outcome which is exp(8- v / 0.08) ~ 9.61 times higher, 
an effect comparable in magnitude to the random 
intercepts. 

They also report poor convergence rates when fit- 
ting the assumed bivariate normal distribution to 
the bivariate mixture distribution. We performed 
a small simulation to check their results and to as- 
sess the magnitude of the bias under the highly non- 
normal but more reasonable situation where the ran- 
dom slopes variance was 0.08. We used the same 
model as Litiere, Alonso and Molenberghs (2008), 
namely, 

io g it(p{y 4t = i}) 

(5) 

= (A) + b i) + /3 b Zi + (P w + b wi )tj, 

with Z{ binary with equal proportions of and 1, 
tj = 0, 1,2,4,6, and 8 (within each cluster of size 6) 
and 100 clusters. We performed 500 replications and 
fit the models via maximum likelihood using SAS 
Proc NLMIXED assuming a bivariate normal distri- 
bution. Table 2 reports the results. When fitting to 
a true normal distribution, the median values when 
the random slope variance is 5 (upper panel) are 
remarkably close to the true values. When fitting 
to a true mixture distribution, the median values 
exhibit bias, but not nearly as extreme as those re- 
ported by Litiere, Alonso and Molenberghs (2008). 
Convergence rates when fitting to the mixture dis- 
tribution are much better as well. Our results are 



consistent with previous literature, in that the im- 
pact on the within-cluster coefficient (3 W is less ex- 
treme than the between-cluster coefficient, (3^. Un- 
der the more reasonable random slopes variance of 
0.08 there is only a small amount of bias. Of course, 
it is important to remember that, even under the 
more reasonable value of the variance, the mixture 
model is a highly non-normal bivariate distribution 
with most of its mass concentrated around (—2, —2) 
and (2, 2). 

13. SUMMARY 

We considered robustness to the assumed distri- 
bution for a random intercept when using maximum 
likelihood methods for fitting a generalized linear 
mixed model such as (1). In practice, data analysts 
commonly assume normality for the random effects. 
Theory and simulation studies indicate that most 
aspects of statistical inference are highly robust to 
this assumption. Especially robust were inferences 
for within-cluster covariate effects, which are often 
a key reason for considering a clustered data design. 

These conclusions are contrary to much of the 
previous literature. We have argued that this is be- 
cause (a) results for the nonclustered data situation 
(e.g., Heckman and Singer, 1984) are incorrectly in- 
terpreted as relevant to the clustered data setting, 
(b) results restricted to one portion of the model 
(primarily the intercept) are characterized as a gen- 
eral failure of the methodology, and (c) reanalysis 
of previous scenarios has sometimes given different 
interpretations or results. 

Exceptions to this robustness include estimation 
of the intercept, which can be biased when the ran- 
dom intercept distribution is misspecified. The shape 
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of the estimated random effects distribution (based 
on the distributional shape of the predicted random 
effects) can also be quite sensitive to the shape of 
the assumed distribution and thus not reflect the 
true distribution. 

However, a wide array of inferences are quite ro- 
bust to this type of misspecification, including es- 
timation of covariate effects, estimation of the ran- 
dom effects variance and the mean square error of 
prediction of the realized value of random effects. 
We also argued that this robustness extends to ran- 
dom intercept situations where the cluster size is 
informative. 

Between-cluster covariate effects and estimation 
of the log of the standard deviation of the random 
effects were found to be robust, but not impervious, 
to misspecification of the random effects distribu- 
tion. The literature and results here indicate that 
a modest amount of bias and minor loss of efficiency 
can occur, especially when the true distribution is 
far from the assumed (e.g., assuming a normal dis- 
tribution when the true distribution is a two-point 
distribution or the Tukey distribution investigated 
here), the random effects are large, and the cluster 
size is large. When random effects and cluster sizes 
are large, the data provide a fair amount of infor- 
mation as to the shape of the random effects distri- 
bution and correctly specifying the random effects 
distribution gives some advantage. 

Particularly robust was estimation of within-clus- 
ter covariate effects. This is especially important be- 
cause exploitation of within-cluster comparisons is 
often the rationale for conducting longitudinal or 
clustered data studies. Virtually all the evidence in 
the literature and the new results reported here in- 
dicate that within-cluster covariate effects are esti- 
mated with no more bias when fitting a misspecified 
model than when fitting the correct model and with 
little or no loss of efficiency. 

ACKNOWLEDGMENTS 

We thank Ross Boylan for computational assis- 
tance with the simulation studies and Stephen Hul- 
ley for use of the HERS data set. Support was pro- 
vided by NIH Grant R01 CA82370. 

REFERENCES 

Agresti, A., Caffo, B. and Ohman-Strickland, P. 
(2004). Examples in which misspecification of a random 
effects distribution reduces efficiency, and possible reme- 
dies. J. Comput. Graph. Statist. 47 639-653. MR2100566 



Aitkin, M. (1999). A general maximum likelihood analysis 
of variance components in generalized linear models. Bio- 
metrics 55 117-128. MR1705676 

Akaike, H. (1973). Information theory and an extension 
of the maximum likelihood principle. In Second Interna- 
tional Symposium on Information Theory (B. N. Petrov 
and F. Czaki, eds.) 267-281. Akademiai Kiado, Budapest. 
MR0483125 

Auble, T. E., Hsieh, M., McCausland, J. B. and 
Yealy, D. M. (2007). Comparison of four clinical predic- 
tion rules for estimating risk in heart failure. Annals of 
Emergency Medicine 50 127-135. 

Benhin, E., Rao, J. N. K. and Scott, A. J. (2005). 
Mean estimating equation approach to analysing cluster- 
correlated data with nonignorable cluster sizes. Biometrika 
92 435-450. MR2201369 

Butler, S. M. and Louis, T. A. (1992). Random effects 
models with non-parametric priors. Stat. Med. 11 1981- 
2000. 

Caffo, B., An, M. and Rohde, C. (2007). Flexible ran- 
dom intercept models for binary outcomes using mixtures 
of normals. Comput. Statist. Data Anal. 51 5220-5235. 
MR2370867 

Chen, J., Zhang, D. and Davidian, M. (2002). A Monte 
Carlo EM algorithm for generalized linear mixed models 
with flexible random effects distribution. Biostatistics (Ox- 
ford) 3 347-360. 

Davidian, M. and Gallant, A. R. (1993). The nonlinear 
mixed effects model with a smooth random effects density. 
Biometrika 80 475-488. MR1248015 

Ghidey, W., Lesaffre, E. and Eilers, P. (2004). Smooth 
random effects distribution in a linear mixed model. Bio- 
metrics 60 945-953. MR2133547 

Ghidey, W., Lesaffre, E. and Verbeke, G. (2010). A com- 
parison of methods for estimating the random effects dis- 
tribution of a linear mixed model. Stat. Methods Med. Res. 
19 565-600. 

He, Y. and Raghunathan, T. (2006). Tukey's gh distribu- 
tion for multiple imputation. Amer. Statist. 60 251-256. 
MR2246758 

Heagerty, P. J. and Kurland, B. F. (2001). Misspeci- 
fied maximum likelihood estimates and generalised linear 
mixed models. Biometrika 88 973-985. MR1872214 

Heagerty, P. J. and Zeger, S. L. (2000). Marginalized 
multilevel models and likelihood inference (with comments 
and a rejoinder by the authors). Statist. Sci. 15 1-26. 
MR1842235 

Heckman, J. and Singer, B. (1984). A method for mini- 
mizing the impact of distributional assumptions in econo- 
metric models for duration data. Econometrica 52 271-320. 
MR0735309 

Hoffman, E. B., Sen, P. K. and Weinberg, C. R. 
(2001). Within-cluster resampling. Biometrika 88 1121- 
1134. MR1872223 

Huang, X. (2009). Diagnosis of random-effect model mis- 
specification in generalized linear mixed models for binary 
response. Biometrics 65 361-368. MR2751459 

Huber, P. J. (1967). The Behavior of Maximum Likelihood 
Estimates Under Nonstandard Conditions. In Proc. Fifth 
Berkeley Sympos. Math. Statist, and Probability (Berkeley, 



MISSPECIFYING A RANDOM EFFECTS DISTRIBUTION 



15 



Calif., 1965/66) 1 (L. M. Le Cam and J. Neyman, eds.) 
221-223. Univ. California Press, Berkeley. MR0216620 
Hulley, S., Grady, D., Bush, T., Furberg, C, Her- 

RINGTON, D., RlGGS, B. and VlTTINGHOFF, E. (1998). 
Randomized trial of estrogen plus progestin for secondary 
prevention of coronary heart disease in postmenopausal 
women. Heart and Estrogen/progestin Replacement Study 
(HERS) Research Group. Journal of the American Medical 
Association 280 605-613. 

Kullback, S. C. (1959). Information Theory and Statistics. 
Wiley, New York. MR0103557 

Laird, N. (1978). Nonparametric maximum likelihood esti- 
mation of a mixing distribution. J. Amer. Statist. Assoc. 
73 805-811. MR0521328 

Lee, Y. and Nelder, J. A. (2004). Conditional and marginal 
models: Another view. Statist. Set. 19 219-238. MR2140539 

Lesaffre, E. and Molenberghs, G. (2001). Multivariate 
probit analysis: A neglected procedure in medical statistics. 
Stat. Med. 10 1391-1403. 

Litiere, S., Alonso, A. and Molenberghs, G. (2007). 
Type I and type II error under random-effects misspeci- 
fication in generalized linear mixed models. Biometrics 63 
1038-1044. MR2414580 

Litiere, S., Alonso, A. and Molenberghs, G. (2008). The 
impact of a misspecified random-effects distribution on the 
estimation and the performance of inferential procedures in 
generalized linear mixed models. Stat. Med. 27 3125-3144. 
MR2522153 

Magder, L. S. and Zeger, S. L. (1996). A smooth non- 
parametric estimate of a mixing distribution using mix- 
tures of Gaussians. J. Amer. Statist. Assoc. 91 1141-1151. 
MR1424614 

McCulloch, C. E. and Neuhaus, J. M. (2011). Prediction 
of random effects in linear and generalized linear models 
under model misspecification. Biometrics 67 270-279. 

McCulloch, C. E., Searle, S. R. and Neuhaus, J. M. 
(2008). Generalized, Linear and Mixed Models, 2nd ed. Wi- 
ley, New York. MR2431553 

Metlay, J. P., Camargo, C. A., Mackenzie, T., McCul- 
loch, C. E., Maselli, J., Levin, S. K., Kersey, A., 
Gonzales, R. and the IMPAACT Investigators 
(2007). Cluster-randomized trial to improve antibiotic use 
for adults with acute respiratory infections treated in emer- 
gency departments. Annals of Emergency Medicine 50 221- 
230. 

Neuhaus, J. M., Hauck, W. W. and Kalbfleisch, J. D. 
(1992). The effects of mixture distribution misspecification 
when fitting mixed-effects logistic models. Biometrika 79 
755-762. 

Neuhaus, J. M., Kalbfleisch, J. D. and Hauck, W. W. 
(1994). Conditions for consistent estimation in mixed- 
effects models for binary matched pairs data. Canad. J. 
Statist. 22 139-148. MR1271451 



Neuhaus, J. M. and Kalbfleisch, J. D. (1998). Between- 
and within-cluster covariate effects in the analysis of clus- 
tered data. Biometrics 54 638-645. 

Neuhaus, J. M. and McCulloch, C. E. (2006). Separating 
between and within-cluster covariate effects using condi- 
tional and partitioning methods. J. Roy. Statist. Soc. Ser. 
B 68 859-872. MR2301298 

Neuhaus, J. M., McCulloch, C. E. and Boylan, R. 
(2011). A note on type II error under random effects mis- 
specification in generalized linear mixed models. Biomet- 
rics 67 654-656. 

Neuhaus, J. M. and McCulloch, C. E. (2011). Estimation 
of covariate effects in generalised linear mixed models with 
informative cluster sizes. Biometrika 98 147-162. 

Piepho, H.-P. and McCulloch, C. E. (2004). Transforma- 
tions in mixed models: Application to risk analysis for a 
multienvironment trial. J. Agric. Biol. Environ. Statist. 9 
123-137. 

Rasch, D. and Guiard, V. (2004). The robustness of para- 
metric statistical methods. Psychology Science 46 175-208. 

Raudenbush, S. and Bryk, A. (2002). Hierarchical Linear 
Models: Applications and Data Analysis Methods, 2nd ed. 
Sage Publications, Thousand Oaks. 

Searle, S. R., Casella, G. and McCulloch, C. E. (1992). 
Variance Components. Wiley, New York. MR1 190470 

Selby, J. V., Fireman, B. H., Lundstrom, R. J., 
Swain, B. E., Truman, A. F., Wong, C. C, Froe- 
licher, E. S., Barron, H. V. and Hlatky, M. A. (1996). 
Variation among hospitals in coronary-angiography prac- 
tices and outcomes after myocardial infarction in a large 
health maintenance organization. New England Journal of 
Medicine 335 1888-1896. 

Tao, H., Palta, M., Yandell, B. S. and Newton, M. A. 
(1999). An estimation method for the semiparametric 
mixed effects model. Biometrics 55 102-110. MR1705675 

Verbecke, G. and Lesaffre, E. (1997). The effect of mis- 
specifying the random-effects distribution in linear mixed 
models for longitudinal data. Comput. Statist. Data Anal. 
23 541-556. MR1437679 

White, H. (1994). Estimation, Inference, and Specification 
Analysis. Cambridge Univ. Press, Cambridge. MR1292251 

Williamson, J. M., Datta, S. and Satten, G. A. (2003). 
Marginal analyses of clustered data when cluster size is 
informative. Biometrics 59 36-42. MR1978471 

Zhang, D. and Davidian, M. (2001). Linear mixed models 
with flexible distribution of random effects for longitudinal 
data. Biometrics 57 795-802. MR1859815 

Zhang, P., Song, P. X. K., Qu, A. and Greene, T. (2008). 
Efficient estimation for patient-specific rates of disease pro- 
gression using nonnormal linear mixed models. Biometrics 
64 29-38. MR2422816 



